221 research outputs found

    De novo construction of polyploid linkage maps using discrete graphical models

    Full text link
    Linkage maps are used to identify the location of genes responsible for traits and diseases. New sequencing techniques have created opportunities to substantially increase the density of genetic markers. Such revolutionary advances in technology have given rise to new challenges, such as creating high-density linkage maps. Current multiple testing approaches based on pairwise recombination fractions are underpowered in the high-dimensional setting and do not extend easily to polyploid species. We propose to construct linkage maps using graphical models either via a sparse Gaussian copula or a nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes, and the order of markers in each LG are determined by inferring the conditional independence relationships among large numbers of markers in the genome. Through simulations, we illustrate the utility of our map construction method and compare its performance with other available methods, both when the data are clean and contain no missing observations and when data contain genotyping errors and are incomplete. We apply the proposed method to two genotype datasets: barley and potato from diploid and polypoid populations, respectively. Our comprehensive map construction method makes full use of the dosage SNP data to reconstruct linkage map for any bi-parental diploid and polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure

    netgwas: An R Package for Network-Based Genome-Wide Association Studies

    Full text link
    Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil

    Identifying overlapping terrorist cells from the Noordin Top actor-event network

    Get PDF
    Actor-event data are common in sociological settings, whereby one registers the pattern of attendance of a group of social actors to a number of events. We focus on 79 members of the Noordin Top terrorist network, who were monitored attending 45 events. The attendance or non-attendance of the terrorist to events defines the social fabric, such as group coherence and social communities. The aim of the analysis of such data is to learn about the affiliation structure. Actor-event data is often transformed to actor-actor data in order to be further analysed by network models, such as stochastic block models. This transformation and such analyses lead to a natural loss of information, particularly when one is interested in identifying, possibly overlapping, subgroups or communities of actors on the basis of their attendances to events. In this paper we propose an actor-event model for overlapping communities of terrorists, which simplifies interpretation of the network. We propose a mixture model with overlapping clusters for the analysis of the binary actor-event network data, called {\tt manet}, and develop a Bayesian procedure for inference. After a simulation study, we show how this analysis of the terrorist network has clear interpretative advantages over the more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA

    A penalized inference approach to stochastic block modelling of community structure in the Italian Parliament

    Get PDF
    We analyse bill cosponsorship networks in the Italian Chamber of Deputies. In comparison with other parliaments, a distinguishing feature of the Chamber is the large number of political groups. Our analysis aims to infer the pattern of collaborations between these groups from data on bill cosponsorships. We propose an extension of stochastic block models for edge-valued graphs and derive measures of group productivity and of collaboration between political parties. As the model proposed encloses a large number of parameters, we pursue a penalized likelihood approach that enables us to infer a sparse reduced graph displaying collaborations between political parties

    Convergence properties of multi-environment causal regularization

    Full text link
    Causal regularization was introduced as a stable causal inference strategy in a two-environment setting in \cite{kania2022causal}. We start with observing that causal regularizer can be extended to several shifted environments. We derive the multi-environment casual regularizer in the population setting. We propose its plug-in estimator, and study its concentration in measure behavior. Although the variance of the plug-in estimator is not well-defined in general, we instead study its conditional variance both with respect to a natural filtration of the empirical as well as conditioning with respect to certain events. We also study generalizations where we consider conditional expectations of higher central absolute moments of the estimator. The results presented here are also new in the prior setting of \cite{kania2022causal} as well as in \cite{Rot}

    Inferring slowly-changing dynamic gene-regulatory networks

    Get PDF
    Dynamic gene-regulatory networks are complex since the interaction patterns between their components mean that it is impossible to study parts of the network in separation. This holistic character of gene-regulatory networks poses a real challenge to any type of modelling. Graphical models are a class of models that connect the network with a conditional independence relationships between random variables. By interpreting these random variables as gene activities and the conditional independence relationships as functional non-relatedness, graphical models have been used to describe gene-regulatory networks. Whereas the literature has been focused on static networks, most time-course experiments are designed in order to tease out temporal changes in the underlying network. It is typically reasonable to assume that changes in genomic networks are few, because biological systems tend to be stable. We introduce a new model for estimating slow changes in dynamic gene-regulatory networks, which is suitable for high-dimensional data, e.g. time-course microarray data. Our aim is to estimate a dynamically changing genomic network based on temporal activity measurements of the genes in the network. Our method is based on the penalized likelihood with l1-norm, that penalizes conditional dependencies between genes as well as differences between conditional independence elements across time points. We also present a heuristic search strategy to find optimal tuning parameters. We re-write the penalized maximum likelihood problem into a standard convex optimization problem subject to linear equality constraints. We show that our method performs well in simulation studies. Finally, we apply the proposed model to a time-course T-cell dataset

    Efficient implementation of sets and multisets in R using hash tables

    Full text link
    The package hset for the R language contains an implementation of a S4 class for sets and multisets of numbers. The implementation, based on the hash table data structure from the package hash (Brown, 2019), allows for quick operations when the set is a dynamic object. An important example is when a set or a multiset is part of the state of a Markov chain in which in each iteration various elements are moved in and out of the set
    • …
    corecore